back-propagation algorithm
On the Principles of ReLU Networks with One Hidden Layer
A neural network with one hidden layer or a two-layer network (regardless of the input layer) is the simplest feedforward neural network, whose mechanism may be the basis of more general network architectures. However, even to this type of simple architecture, it is also a ``black box''; that is, it remains unclear how to interpret the mechanism of its solutions obtained by the back-propagation algorithm and how to control the training process through a deterministic way. This paper systematically studies the first problem by constructing universal function-approximation solutions. It is shown that, both theoretically and experimentally, the training solution for the one-dimensional input could be completely understood, and that for a higher-dimensional input can also be well interpreted to some extent. Those results pave the way for thoroughly revealing the black box of two-layer ReLU networks and advance the understanding of deep ReLU networks.
Introduction to Machine Learning
This book introduces the mathematical foundations and techniques that lead to the development and analysis of many of the algorithms that are used in machine learning. It starts with an introductory chapter that describes notation used throughout the book and serve at a reminder of basic concepts in calculus, linear algebra and probability and also introduces some measure theoretic terminology, which can be used as a reading guide for the sections that use these tools. The introductory chapters also provide background material on matrix analysis and optimization. The latter chapter provides theoretical support to many algorithms that are used in the book, including stochastic gradient descent, proximal methods, etc. After discussing basic concepts for statistical prediction, the book includes an introduction to reproducing kernel theory and Hilbert space techniques, which are used in many places, before addressing the description of various algorithms for supervised statistical learning, including linear methods, support vector machines, decision trees, boosting, or neural networks. The subject then switches to generative methods, starting with a chapter that presents sampling methods and an introduction to the theory of Markov chains. The following chapter describe the theory of graphical models, an introduction to variational methods for models with latent variables, and to deep-learning based generative models. The next chapters focus on unsupervised learning methods, for clustering, factor analysis and manifold learning. The final chapter of the book is theory-oriented and discusses concentration inequalities and generalization bounds.
Learning by the F-adjoint
A recent paper by Boughammoura (2023) describes the back-propagation algorithm in terms of an alternative formulation called the F-adjoint method. In particular, by the F-adjoint algorithm the computation of the loss gradient, with respect to each weight within the network, is straightforward and can simply be done. In this work, we develop and investigate this theoretical framework to improve some supervised learning algorithm for feed-forward neural network. Our main result is that by introducing some neural dynamical model combined by the gradient descent algorithm, we derived an equilibrium F-adjoint process which yields to some local learning rule for deep feed-forward networks setting. Experimental results on MNIST and Fashion-MNIST datasets, demonstrate that the proposed approach provide a significant improvements on the standard back-propagation training procedure.
A Back-Propagation Algorithm with Optimal Use of Hidden Units
This paper presents a variation of the back-propagation algo(cid:173) rithm that makes optimal use of a network hidden units by de(cid:173) cr asing an "energy" term written as a function of the squared activations of these hidden units. The algorithm can automati(cid:173) cally find optimal or nearly optimal architectures necessary to solve known Boolean functions, facilitate the interpretation of the activation of the remaining hidden units and automatically estimate the complexity of architectures appropriate for phonetic labeling problems. The general principle of the algorithm can also be adapted to different tasks: for example, it can be used to eliminate the [0, 0] local minimum of the [-1.
An Efficient Implementation of the Back-propagation Algorithm on the Connection Machine CM-2
In this paper, we present a novel implementation of the widely used Back-propagation neural net learning algorithm on the Connection Machine CM-2 - a general purpose, massively parallel computer with a hypercube topology. This implementation runs at about 180 million interconnections per second (IPS) on a 64K processor CM- 2. The main interprocessor communication operation used is 2D nearest neighbor communication. The techniques developed here can be easily extended to implement other algorithms for layered neural nets on the CM-2, or on other massively parallel computers which have 2D or higher degree connections among their processors.
Artificial Neural Networks Connects missing Dots between MSA & LSA
The Machine learning model classifies lithic assemblages of Eastern Africa to identify major incidents of history. Archaic incidents have shaped the existence of human civilization. It tells about the evolution of humans over centuries. Understanding these incidents helps us to have a peek in the past. Moreover, it helps in analyzing the various processes that led to the development of humans.
Neural Networks Without Matrix Math
The challenge of speeding up AI systems typically means adding more processing elements and pruning the algorithms, but those approaches aren't the only path forward. Almost all commercial machine learning applications depend on artificial neural networks, which are trained using large datasets with a back-propagation algorithm. This result is compared to the known "correct" answer, and the difference between the two is used to adjust the weights applied to the network nodes. The process repeats for as many training examples as needed to (hopefully) converge to a stable set of weights that gives acceptable accuracy. This standard algorithm requires two distinct computational paths -- a forward "inference" path to analyze the data, and a backward "gradient descent" path to correct node weights.
The Who's Who Of Machine Learning, And Why You Should Know Them
"AI is the new electricity" If you're a machine learning and ai enthusiast, you definitely must know this guy. He is best known for his machine learning course on coursera which, for many, has been the first step in understanding artificial intelligence(read my blog about it here). Andrew has been teaching at stanford ever since he got his Phd in 2002. He founded and led the google brain team which is considered as one of the most progressive ML/AI research organisations in the world. He also founded the popular massive open online course (MOOC) site coursera, which now has over a thousand courses taught by ivy league professors.
Alternating Back-Propagation for Generator Network
Han, Tian (University of California, Los Angeles) | Lu, Yang (University of California, Los Angeles) | Zhu, Song-Chun (University of California, Los Angeles) | Wu, Ying Nian (University of California, Los Angeles)
This paper proposes an alternating back-propagation algorithm for learning the generator network model. The model is a non-linear generalization of factor analysis. In this model, the mapping from the continuous latent factors to the observed signal is parametrized by a convolutional neural network. The alternating back-propagation algorithm iterates the following two steps: (1) Inferential back-propagation, which infers the latent factors by Langevin dynamics or gradient descent. (2) Learning back-propagation, which updates the parameters given the inferred latent factors by gradient descent. The gradient computations in both steps are powered by back-propagation, and they share most of their code in common. We show that the alternating back-propagation algorithm can learn realistic generator models of natural images, video sequences, and sounds. Moreover, it can also be used to learn from incomplete or indirect training data.
Alternating Back-Propagation for Generator Network
Han, Tian, Lu, Yang, Zhu, Song-Chun, Wu, Ying Nian
This paper proposes an alternating back-propagation algorithm for learning the generator network model. The model is a non-linear generalization of factor analysis. In this model, the mapping from the continuous latent factors to the observed signal is parametrized by a convolutional neural network. The alternating back-propagation algorithm iterates the following two steps: (1) Inferential back-propagation, which infers the latent factors by Langevin dynamics or gradient descent. (2) Learning back-propagation, which updates the parameters given the inferred latent factors by gradient descent. The gradient computations in both steps are powered by back-propagation, and they share most of their code in common. We show that the alternating back-propagation algorithm can learn realistic generator models of natural images, video sequences, and sounds. Moreover, it can also be used to learn from incomplete or indirect training data.